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1. Introduction 

An important goal of programming language research is to isolate the fundamental concepts of 
languages, those basic ideas that allow us to understand the relationship among various language 
features. This paper examines one of these underlying notions, data type, and presents a meaning 
for this term that allows us to: 

describe a simple treatment of generic or polymorphic procedures that preserves full static 
type-checking and allows unrestricted use of recursion; and 

give a precise meaning to the phrase strong typing, so that Language X is strongly typed can 
be interpreted as a critically important theorem about the semantics of the language. 

This approach to the meaning of data types was used by the authors in the design of the programming 
language Russell [BoehmSO, Demers80a,b] and Russell will be used to present examples below. One 
thesis of this paper is that the semantics of data types presented here has served us well in the 
Russell design. 

The paper is organized as follows. In Section 2 we present the motivation leading to our search 
for a new meaning of data type. Section 3 describes our interpretation of data types as collections 
of operations. Sections 4 and 5 describe how this approach is used in Russell to allow unrestricted 
polymorphism and to provide a semantic justification for the syntactic type-checking rules of Russell. 
The final section closes with some thoughts on the general approach taken in this research. 



2. The Problem 

The particular problem that motivated our study of data types was the difficulty of writing 
generic or polymorphic procedures — procedures that can operate on variables and values of more 
than one type — in existing Algol-like languages. For example, consider the simple Pascal procedure 
procedure Swap(var x,y: integer) 
van z: integer; 

begin z := x; x := y; y := z end 
Even though Swap contains no code that depends on the particular properties of the type of its 
arguments, it cannot be used to swap variables of any type other than integer. This can make writing 
subroutine packages extremely tedious — the more general-purpose a subroutine is, the more copies 
of it are necessary. The introduction of user-defined data types makes this restriction even more 
telling. How can one apply existing operators to values of a newly-defined type? 

One can, of course, evade this problem, as is done in dynamically-typed languages like LISP 
and SMALLTALK or in typeless languages like BCPL. However, these alternative approaches have 
their own sets of problems. In typeless languages, the common programming blunder of applying 
an operation to an operand of the wrong type is not a detectable error, as the notion of wrong type 
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is absent. In dynamically-typed languages such type errors can be detected, but only when it is too 
late to do anything about them. Additionally, the obligation to check for such errors increases the 
execution cost of every program, even programs using only legal operator/operand combinations. 

Milner [78] has suggested the use of type determination as a means of adding the desired 
flexibility. The basic idea is that there is sufficient information in a program for a compiler to 
determine the type constraints necessary for safe execution; the programmer need not supply any 
type information about the variables in his program. While this approach seems practicable in many 
cases, it too suffers a serious flaw. The degree of flexibility it achieves is less dependent on what 
the programmer writes than on how clever the compiler is. Milner's particular type-determination 
algorithm fails to allow passing generic procedures as arguments, because it assumes that all uses of 
an identifier have the same type. If the programmer could specify the types of parameters, it would 
be possible to write correct programs that violated this assumption. Conversely, there are instances 
in which a programmer wishes a program to have only very limited possibilities for use; this system 
provides no way to prohibit arbitrary uses that do not violate compiler-determined type constraints. 

Our approach to allowing generic operations is to use y, allowing it to swap different variables. 
Why not parameterize Swap with respect to a type, allowing it to swap variables of different types? 

This approach has the advantage of simplicity, but it also poses a basic problem in semantics: 
what does a data type argument mean? The answer to this question should be consistent with our 
present understanding of parameterization and application. We talk about the value of the argument 
when we pass an integer to a function; our meaning of data type should explain what is meant by 
the value of the argument when we pass a data type. Having such a meaning allows us to design a 
language in which data type parameters fit well— they are treated like any other parameters. It is 
also consistent with the dogma of denotational semantics [Tennent77, Scott77] which says that we 
should define the meanings of programs in terms of values in appropriately chosen abstract value 
spaces. 

In the next Section we draw on work in denotational semantics to develop a notion of data 
types as values. As we shall see, this notion allows us to give a very simple interpretation of type 
parameterization and to give precise answers to several other language design questions involving 
data types. We shall discuss one of these points— the meaning of strong typing— in detail, and show 
how our interpretation of data types allows us to give a particularly attractive meaning to this phrase. 



3. Data Types 

Our goal is to give a meaning for the value of a data type parameter. For this purpose the 
common interpretation of data types as sets of values is less appropriate than the following meaning: 

A data type is a collection of named operations that provide an interpretation of values and 
variables of a single universal value space. 



XEROX PARC, CSL-83-5, MARCH 1984 



DATA TYPES ARE VALUES 



We first explain what we mean by interpretation and then discuss how this meaning of data type 
allows us to view data types as values. Our discussion will be informal, but we will suggest how, 
using the work of Scott, these ideas can be made mathematically precise. 

3.1 Data Types as Interpretations of Values 

We begin by assuming the underlying value space to be typeless. Informally that is what we 
mean by a single value space; the meaning of universal will be discussed below. No properties of 
the values themselves allow us to say, for example, that one value is an integer while another is a 
Boolean. This assumption accords well with the hardware of most machines: values are represented 
by (untyped) sequences of bits, and can be partitioned into disjoint sets only by introducing explicit 
tag fields, with the associated overhead in time and space. 

Observe that values in a typeless value space have no inherent meaning as well as no inherent 
type. Continuing our analogy with hardware, the same sequence of bits can be used to represent 
logical values, integers, floating point values, and the programs that manipulate them. Given a 
particular bit string, we cannot say what it means any more than we can say what its type is. The 
most we can say is something of the form if this value is used as an operand of the integer addition 
operator, it will behave as follows. . . . 

How then can we speak about the meaning of a typeless value? By considering the interpretation 
of the value by various operations. For example, a particular value may behave as the identity 
element under the integer addition operator. Certain collections of operations considered together 
may impose a consistent interpretation of the value space. For example, the value that behaves as 
the identity for integer addition also yields itself when used in integer multiplication. Thus, instead 
of saying that a (typed) value is the integer 0, we can say that for a particular choice of integer 
operations an (untyped) value behaves like the integer 0. The same untyped value may also behave 
like the Boolean True and the character 'a', when interpreted by the other collections of operations. 

The notion that a set of operations can impose a consistent interpretation of the values of a 
typeless value space is the idea behind the treatment of data types in Russell. In Russell, values 
themselves have no inherent meaning; instead, meanings are imposed on them by collections of 
operations. We call these collections of operations data types. 

3.2 Data Types as Values 

The question still remains how we can regard the types themselves as values. To answer this, 
we draw on Scott's work in models of the untyped lambda calculus, which provided the inspiration 
for this approach. 

Abstractly, what are the characteristics of the value space we need? Firstly, as argued above, it 
must be typeless. In addition, it must be large— ii must contain the (computable) operations over it. 
Thus types themselves, which are simply finite collections of operations, must be values in the space. 
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Note that the combination of these two requirements means that self-appUcation must be allowable. 
The arguments or result of any operation may be any value (because of the typelessness of the value 
space), so indeed an operation could be used to interpret itself. Such a value space can reasonably 
be called universal. Does one exist, however? 

Intuitively, yes. Returning to our hardware analogy, the store is typeless and (ignoring size 
limitations) allows the implementation of any computable operation. Moreover, the indistinguish- 
ability of program and data allows any operation to be performed on (the representation oO itself. 
A more abstract mathematical formulation of such spaces can be found in [Scott76, 77]. In these 
papers, Scott shows how to construct a value space D that is isomorphic to its own space of continuous 
functions, i.e., that satisfies (up to isomorphism) the equation D = D -> D. The immediate 
importance of such a domain is that it allows a simple definition of the untyped lambda calculus. 
For example, in the lambda expression Xx. x(x), the identifier x simultaneously stands for a 
function and its argument. This makes sense only for a value space satisfying the above isomorphism. 
More generally, the techniques introduced by Scott allow the construction of a variety of rich function 
spaces in which self-application is possible. In particular, one can build a reasonable mathematical 
model of a machine store, including the possibility of a program operating on itself. An example of 
this can be seen in the Russell semantics of [DemersSOb]. 

It remains to be asked what sorts of operations form a data type in an Algol-like language: how 
are variables and values given interpretations? As a simple example, consider a language like Algol60 
with only primitive (unstructured) types. In such a language, we may store values in variables, 
extract values from variables and compose values by applying certain primitive functions. The 
meaning of the "primitive" operations over such a space is taken as the meaning of a data type, 
since it is by these operations that the underlying value space of Algol60 programs is manipulated. 

For example, consider the following Algol program fragment: 

integer x,y; x := 0; y := x. 

The meaning (or denotation) of integer must provide at least the following: 

The meaning of value extraction (so we know how to take the value of the variable x on the 
right-hand side of the second assignment), 

the meaning of assignment (so we know how to store an integer value in y), and 

the meaning of the constant (or nullary function) 0. 

The set of operations which is needed to provide an interpretation is a language-dependent matter. 
In [Donahue79], we give a semantics for a polymorphic lambda calculus that uses a single function 
(a retraction) as the meaning of a data type. More operations are needed for data types in a 
language like Pascal, but the same basic approach works in both cases. 
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3.3 Comparison With Other Approaches 

The meaning of data type presented above is similar to the idea of algebraic specifications 
described in [Guttag77] and [Goguen76] in its focus on the operations of a type. There is, however, 
a subtle but important difference between these approaches. In the algebraic approach, one assumes 
the existence of a collection of carrier sets, and most of the literature suggests that these carrier sets 
may be assumed to be disjoint. Our approach is to choose a single typeless carrier set that not only 
represents the values of every type, but also represents the operations of all types. 

Our approach also has a (less obvious) connection with the common types are sets of values 
approach: we can find a way to treat our collection of operations as defining a set of values. In the 
same way that the integer operations must be related by simple algebraic laws, so must the other 
operations of a data type, including those of assignment and value extraction. We can make these 
properties precise by a straightforward use of standard denotational semantics (see [Milne76, 
Tennent77]). 

To give a mathematical meaning to assignment, we first need a model of a machine store; we 
will use a function space S such that S = Loc ->• Val where Loo is some domain of locations and 
Val is the domain of storable values (in Russell, our universal value space). Now we can define 
value extraction by a function: 

ValueOf: [ Loc X S ] -* Val 
that returns the value in the given location of the store. Similarly, assignment can be defined by a 
function: 

Update: [ Loc X Val ] -► S ^ S 
that produces a new store by changing the contents of the given location to the new value. Now if 
we look at this pair of operations, we can see that for all 1 and s : 

ValOfUpdate = Xv. ValueOf(l, UpdateCl, v)(s)) 
(a function of type Val->^Val) must be the same function as ValOfUpdate » ValOfUpdate. In 
words, assigning a value to a variable and then taking the value of the variable must produce the 
same result as performing the assignment, taking the value of the variable and then performing the 
assignment and extraction again with the value produced. Note that this is a weaker condition than 
saying that ValueOf must always produce the value previously assigned, i.e., 

ValueOf(l, Update(l, s)(v)) = v 
in that we leave open the possibility that at least some values will be altered by assignment. 

In the parlance of denotational semantics, the function ValOfUpdate is a retraction, a. function 
f such that f = fof. A retraction in D-*D has the very important property that it "collapses" D 
onto the range of f and is the identity function on each element of its range. Thus, f can be seen 
as mapping every element of D into its image in the subspace given by the range of f. Elements of 
Val in this subspace will be unmodified by assignment and extraction, while the remaining elements 
will somehow be projected into the subspace. 

The set of values of a type can now be found in the range of this retraction: it is the set of 
values that may legally be assigned to variables of the type. In fact, this is the definition of data 
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type given in [Jensen75]. Note that for any data type, there may be many Val ueOf and Update 
operations satisfying this retraction property. Below we will discuss the use of static type-checking 
as a means of hiding the implementation decision of which of the allowable ValueOf and Update 
operations are actually used. 

We have now described what is meant by universal space and collection of operations providing 
an interpretation, i.e., we have said what we mean by data type. Moreover, this meaning of type 
allows us to give a straightforward semantics of type parameters. Consider the Pascal-like definition: 

Identity == func[T: type; x: val T] val T {(*return*) x} end 
where we have parameterized Identity with respect to a type T. The meaning of the parameters 
can now be understood as follows: 

The value parameter x stands for a value from some universal value space, which will be 
interpreted in the body of the function by the operations of the type T; and 

The type parameter T stands for a set of operations used within the body of Identity to 
interpret values of the universal space. This set of operations can be treated as a natural 
extension of the procedure and function parameters allowed in many existing languages. 

By viewing type parameters this way, we can give meaning to type-parameterized constructs 
independent of any particular arguments supplied to them. In the next section, we consider this 
point in more detail, giving the syntax and semantics of polymorphic constructs in Russell. In 
Section 5, we describe a meaning for strong typing that can be used with this approach to justify 
type-checking rules. 



4. Type Checking and Polymorphism in Russell 

We now show how the principles described above were applied in the treatment of polymorphism 
and type checking in Russell. 



4.1 Signatures 

In our view, types specify interpretations of data: no variable or value in a program has meaning 
until we have specified how it is to be interpreted. To define the interpretation of the value of an 
expression in a Russell program, we associate with each identifier or expression a syntactic type, or 
signature, similar to a program type of [Reynolds78]. The signature of an expression describes how 
the value of that expression should be interpreted by identifying the operations that may be performed 
on it. The criterion for type correctness of a Russell program is that signatures can be assigned 
uniquely according to the rules described below; thus, it is more proper to speak of the signature 
correctness of a Russell program. 
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Every value is interpreted by a Russell program in one of three ways: 

as a function, which may be applied to arguments to yield a result; 

as a type, which by the definition of the previous section consists of a finite set of named 
function values; 

as a data item (variable or value) to be interpreted by the operations of some type. 

These three possible ways to interpret a value are mirrored in the syntax of signatures: 



Sig 

DataSig 

OpSig 

FuncSig 

TypeSig 



DataSig | OpSig 

var Exp | val Exp 

FuncSig | TypeSig 

func[idi: Sigi; ...; idp: Sign] Sigo 

type ido [idi: OpSign ...; idpi OpSign] 



A function signature includes formal parameter signatures and a result signature, as one might 
expect. It also includes formal parameter names; these are necessary for the description of poly- 
morphic functions, as will become clear later. 

A type signature specifies the names and signatures of the operations that make up the type. 
Since a Russell type must provide all the information necessary to interpret a variable or value, types 
in Russell provide operations such as assignment and value extraction that are considered primitive 
in other languages. Some operations, like assignment, are common to most types; but in general 
each Russell type comprises a different set of operations. Note that each type signature includes a 
bound variable (ido in the syntax above); the reason for this will be discussed later. 

Finally, a data signature consists of a var or val indication together with an expression for a 
type whose operations should be used to interpret the value. An important property of the Russell 
signature correctness rules is that they will never assign a signature of the form var e or val e 
unless the expression e can be assigned a type signature. 

The following example illustrates how a polymorphic version of the swap function of our earlier 
example would be written in Russell: 

Swap == func [T: type t [New: func[] var t; 
ValOf: func[var t] val t; 
<- : func[var t; val t] val t ]; 
X, y: var T] 
{let z: var T == T$New[] 
in 

[z] T$<- [T$ValOf[x]]; [x] T$<- [T$ValOf [y]] ; [y] T$<- [T$ValOf[z]] 
ni} 
The procedure heading of Swap (which is also the signature of Swap) specifies that the type 
parameter T provides operations named New, Val Of and <-. The parameters x and y are variables 
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to be interpreted by the operations of T. The body of this function shows how the operations 
provided by T are used. The declaration: 

z: var T == T$New[] 
declares z to be a "new T variable" by binding it to a value produced by applying a function named 
New selected from T. The existence of var-producing functions like New makes it possible for an 
operation of T to provide the meaning of "variable allocation" for the type. Similarly, the three as- 
signments that swap the values of x and y are performed by a function named *- selected from T. 

This simple example suggests how type checking of Russell programs is done using signatures. 
The signatures of the parameters x and y, var T, indicate that they are to be interpreted by the 
operations of the parameter T; the signature of T indicates that it is to be interpreted as a type 
providing all the operations required in the body of Swap. The signature correctness rules described 
below ensure that no misinterpretation takes place. As we show in the next section, signatures contain 
enough information to allow polymorphic functions — even recursive ones — to be type checked, 
without re-instantiating the function for each separate invocation of it. 

Even more important than the ease of signature-checking polymorphic functions in Russell is 
the fact that the meanings of such functions are straightforward. Most treatments of polymorphic 
operations view them as "macros" to be expanded independently for each distinct type argument. 
This approach causes problems when combined with recursion, as the following example shows: 
R == func [n: val integer; T: type t[] ] val integer 

{if n > => R[n-1, Array[l , 10 ,T]] # n <= => 17 fi} 
This procedure is clearly signature-correct: the type parameter specifies no operations, so any type 
expression is a legal argument to R. Also, execution of the program clearly terminates. But simple 
textual macro expansion of the function R can never terminate, because the recursive calls of R use 
progressively more complicated type arguments. Using our treatment of data types as values, which 
gives a uniform semantics to all forms of procedure application, recursive procedures are no more 
difficult to type-check or to interpret than nonrecursive ones. 

Although we have discussed only polymorphic functions that produce simple values, there is no 
reason to prohibit functions from accepting parameters and returning values of any signature 
whatsoever. Since all values exist in the same universal space, the semantics of parameters and 
function results is completely uniform. For example, there is no special form of "parameterized data 
type" in Russell; instead, one simply writes a function that returns a data type. 

In Russell, one finds a degree of "type completeness" not common in programming languages — 
anything a programmer can write can be passed as an argument to something of even higher type. 
Guaranteeing that the combining forms of abstraction and application can be used in an unlimited 
fashion is one response to the recent arguments of Backus[78] about the weakness of combining 
forms in vonNeumann languages. (A further discussion of the importance of type- completeness can 
be found in [DemersSOb].) 

We now turn to a more careful description of Russell signature-checking and show how it allows 
us to give precise mathematical meaning to the phrase "strongly typed." 
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4.2 Structure of the Signature Correctness Rules 

We now present the essentials of the Russell signature correctness rules in a style similar to 
[Bates82, Demers83, Martin-Lof79]. Roughly speaking, the system is a collection of logical inference 
rules that are used to prove formulas stating that a given expression has a given signature. There is 
a straightforward, efficient decision procedure for the system, making it suitable for use in a language 
implementation. 

4.2.1 Formulas 

The formulas of our system include: 

Typings. These are formulas of the form e ~ S, which assert that e is a legal Russell 
expression whose signature is S. 

Legality Assertions. These are formulas of the form 1 egal S, which assert that S is a legal 
signature: if S has the form var e or val e, then e ~ type t[] will be deducible and 
the value produced by evaluating e must not depend on the contents of the store. 

Signature Matchings. These are formulas of the form Si < S2, which assert that Si is more 
restrictive than S2: whenever e ~ Si is deducible, e ~ S2 will be deducible as well. 

4.2.2 Environments 

An environment F is a set of typings in which only simple identifiers (not arbitrary expressions) 
may appear on the left hand sides. Environment T {uniquely) defines an identifier x if there is a 
(unique) signature S such that F contains the formula x ~ S. F is fidnctional if every identifier it 
defines is uniquely defined, and closed if it defines every identifier that occurs free in any of its 
formulas. Intuitively, an environment contains the signatures of all identifiers that have been declared 
in a Russell program. Thus, every environment we use will be functional (every identifier must have 
a unique signature) and closed (the signature of an identifier may not contain undefined identifiers). 

Given F and a set of program identifiers xi through Xn, we define F / xi, ..., Xp, to be the 
maximal closed subset of F not defining any of xi through Xp. This construction simply deletes 
from F definitions of xi through Xp and any formulas that depend on them. We also define: 

(F, Xi ~ Si, ..., Xn ~ Sn) =def r / Xi, ..., Xp U {Xi ~ Si, ..., Xp ~ Sp} 
which gives the effect of declaring new program identifiers in function constructions or let- 
binding — first all references to previous instances of the identifiers are deleted from the environment, 
and then new typings are added. 

Finally, we define F^oVar to be F with all variable identifiers (identifiers with var signatures) 
eliminated as above. The Russell signature correctness rules will assign a tunc or type signature 
to an expression only if that signature can be deduced from FNoVar- The reasons for this restriction 
will be discussed in Section 5. 
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4.2.3 Goals, Inference, and Signature Correctness 

A goal G has the form T I— F, with the meaning that the formula F is a consequence of the 
typings in V. An inference rule has the form: 

Gi« . . . . Gp 

G 
with the meaning that from Gi through Gn we may conclude G. We call G the conclusion of the 
rule, and Gi through Gn its hypotheses or subgoals. A theorem is a goal provable using the inference 
rules given below. A Russell program e will be said to be signature correct with respect to a given 
initial environment Fq if, and only if, there is a theorem of the form Fq I— e ~ S for some signature 
S. 

In the sections that follow, we adhere to the conventions that variables d , e , f , ... represent 
expressions, variables t , u , . . . , z represent new identifiers, and an expression of the form 
d[ei/xi, ..., Gp/Xp] (which represents the result of simultaneously substituting ei through ep for 
xi through Xp in d), is legal only if no capture occurs. 

4.3 Basic Rules 

We begin with some simple rules illustrating how conventional Pascal-like type checking can be 
expressed in this system. First we consider sequential composition. 

Sequential Composition 

F I- ei ~ Si, ..., 

r h- ep ~ Sp 

F I- (ei; ...; ep) ~ Sp 
This rule states that if each of ei through ep is signature-correct in a given environment F, then 
their sequential composition ( ei ; . . . ; ep ) i s signature-correct in that environment, and has the 
same signature as ep. This corresponds to our intuition about the meaning of sequential composition: 
provided no type error occurs during evaluation of ei through ep-i, their values are simply discarded; 
evaluating the compound expression produces the value (and thus has the signature) of ep. 

Russell includes conditional expressions, similar to guarded commands [Dijkstra75], with the 
following signature correctness inference rule. 

Conditional 

F I— ci ~ val boolean, F I— ei ~ S, ..., 
F I— Cp ~ val boolean, F I— ep ~ S 

F I— if Ci => ei # ... # Cp ^ ep f i ~ S 
This rule states that each condition part Ci must have signature val boolean, and the expressions 
ei must be signature correct and have identical signatures. Again, this rule corresponds to our 
intuition about the meaning of conditionals. 
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Equally straightforward rules apply to many of the other Russell constructs. The novelty of the 
Russell rules lies in their treatment of application (of a function to arguments) and selection (of a 
component of a type), which we describe below. 

4.4 Application 

A simple Pascal-like rule for signature correctness of (non-polymorphic) function applications 
is: 

Simple Application 

r I— e ~ func[xi: Si; ...; Xp: Sn] Sq, 

r H ei ~ Si, ..., r I- en ~ Sn 

r h- e[ei, ..., en] ~ Sq 
The intuition behind this rule is clear: a function may legally be applied to a list of arguments 
provided each argument signature can be shown to match the corresponding formal parameter 
signature. However, this simple rule cannot handle application of a polymorphic function, in which 
some parameter signatures may contain occurrences of other parameters. For example, in the 
signature: 

func[T: type t[]; x: val T] val T 

of the polymorphic identity function, the signature val T of parameter x contains an occurrence of 
parameter T, A polymorphic function signature like this one can be thought of as specifying 
relationships that must exist among legal arguments to the function. To treat polymorphic functions 
requires a more powerful rule: 

Russell Application 

r I- e ~ func [ xi: Si; ...; Xn: Sn ] Sq, 

r I- ei ~ Si[ei/xi, ..., en/xn], ..., 

r I- en ~ Sn[ei/Xi, ..., en/Xn] 

r h- e[ ei, ..., en ] ~ So[ei/xi, ..., en/Xn] 

The substitutions in this rule allow it to handle mutually dependent parameter and result 
signatures. For example, the application Identity [integer, 17] can be shown to be signature 
correct and to produce an integer result by instantiating the above rule as: 

F I— Identity ~ func[T: type t[]; x: val T] val T 

F I— integer ~ type t[] [integer/T, 17/x] 

F h- 17 ~ val T[integer/T, 17/x] 

F I— legal val T[integer/T. 17/x] 

F I— Identity[integer, 17] ~ val T[integer/T, 17/x] 
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Performing the indicated substitutions, we obtain: 

r h- Identity ~ func[T: type t[]; x: val T] val T 

r I— integer ~ type t[] 

r h- 17 ~ val integer 

r I— legal val integer 

r I— Iclentity[integer , 17] ~ val integer 
The conclusion of this inference is just what one would expect— the polymorphic identity function 
may be applied to an integer to produce an integer. The first three subgoals are also the natural 
ones showing that the function and its arguments are signature correct. The final subgoal, legal 
val integer, is more subtle. Intuitively, it guarantees that the result type of the application does 
not depend on the values of any variables in the store. We will return to this point when we discuss 
legality assertions and the substitution property below. (Note this is necessary only for the result 
signature; a property of the rules is that if whenever one can conclude e ~ S one can also conclude 
legal S.) 

4.5 Component Selection 

We have seen that the signature correctness rules handle applications of polymorphic functions 
by syntactic substitution of argument expressions for parameters. Similar substitutions occur in the 
rule for selecting a component of a type. 

Selection 

r I- e ~ type t [..., x: S, ...] , T h- legal S[e/t] 

r I- e$x ~ S[e/t] 
The substitution that occurs in the conclusion of this rule allows the signature of a component of a 
type to refer to the type itself 

For example, consider a selection such as integer$+". The signature of the built-in type 
integer is: 

type t [... ; +: func[x,y: val t] val t; ...] 

To produce the signature of integer$+, the above rule can be instantiated as: 
r 1— integer ~ type t [...; +: func[x,y: val t] val t; ...] 
r I— legal func[x,y: val t] val t[integer/t] 
r I— integer$+ ~ func[x,y: val t] val t[integer/t] 

Performing the indicated substitutions yields: 

r I— integer ~ type t [... ; + : func[x,y: val t] val t; ...] 

F I— legal func[x,y: val integer] val integer 

F I— integer$+ ~ func[x,y: val integer] val integer 
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It is clear that the two subgoals will be deducible for any reasonable choice of environment F, so 
the signature of integer$+ is: 

func[x,y: val integer] val integer 
as one would expect. If real were a type with the same signature as integer, the signature of 
real$+ would be: 

func[x,y: val real] val real 
again, just as one would expect. It is the substitution of a type expression (real or integer in 
this case) into the signatures of components selected from it that allows identically-named components 
selected from types with identical signatures to have distinct signatures. 

Like the function application rule discussed in the previous section, this rule includes a subgoal 
of the form legal S. We now turn to a discussion of why these goals are necessary. 

4.6 Signature Legality 

The greatest strength of the Russell signature system is its generality. For example, it is simple 
to construct a Russell function T whose signature is given by: 

T: func [i: val integer] type t [] 
Such functions provide the only parameterized type mechanism that is needed in Russell, However, 
combining such generality with variables and a modifiable store requires great care. As an illustration 
of this, consider the following program fragment: 
let x: var T[integer$ValOf [i ]] == . . . 
in . . . 

[i] integers*- [integer$ValOf [i] integer$+ [1]] 
let y: var T[integer$ValOf[i]] == . . . 
in . . . 

[x] T[integer$ValOf[i]]$*- 
[T[integer$ValOf[i]]$ValOf[y]] 
where i has signature var integer and T is the type-returning function described above. The 
signatures of x and y in this program fragment are textually identical — both variables have signature: 

var T[integer$Va10f[i]]. 
Nevertheless, the assignment of y to x in the last line clearly should not be considered signature- 
correct, since the value of i, which occurs free in the signature, changes between the introduction 
of X and y. 

It was the need to detect and prohibit errors like this that led us to introduce legality assertions 
into the Russell signature rules. The basic rule for inferring that a data signature is legal is the 
following. 
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Data Signature Legality 

TNoVar t- e ~ type t [ ] 

r I— legal van e, F I— legal val e 

The intuition behind this rule is that if one can deduce e ~ type t [ ] without referring to 
any van identifiers in the environment, then the result of evaluating e is independent of the contents 
of memory— in particular, if e is evaluated several times the result will always be the same. Thus, 
the value of an expression with signature var e or val e can be interpreted by operations selected 
from the value of e; identically named operations obtained in this way will always be equivalent 
since the result of evaluating e itself is always the same. 

For completeness, we require rules for legality of function and type signatures as well as data 
signatures. 

Function Signature Legality 

(F, xi ~ Si,..., Xn ~ Sn) I- legal Si 

(F, xi ~ Si, ... , Xn ~ Sn) H legal Sp 

(F, xi ~ Si, ... , Xn ~ Sn) I- legal S 

F I— legal func[xi: Si... Xn: Sn] S 

This rule states that a function signature is legal if the parameter and result signatures are legal. 
Since polymorphic function signatures allow parameter names to occur in the signatures of other 
parameters and in the signature of the result, we cannot in general deduce that the parameter and 
signatures are legal without first adding the parameters to the environment. For example, to deduce 
that the signature: 

func[T: type t []; x: val T] val T 

of the polymorphic identity function is legal, the above rule would be instantiated as: 

(F, T ~ type t [] , x ~ val T) I- legal type t [] 

(F, T ~ type t [] , x ~ val T) I- legal val T 

(F, T ~ type t [] , x ~ val T) h- legal val T 

F I— legal func[T: type t []; x: val T] val T. 
Note that the second and third subgoals, showing legality of the signatures of the parameter x and 
of the result, are identical. 

The rule for legality of type signatures is similar. 

Type Signature Legality 

(F, t ~ type t[xi: Si . . . Xn: Sn]) I— legal Si, ..., 

(F, t ~ type t[Xi: Si . . . Xn: Sn]) t— legal Sn 

F I— legal type t[xi: Si... Xn: Sn] 
Here the local name t is added to the environment to allow us to deduce that the component 
signatures are legal. 
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4.7 Matching Function and Type Signatures 

The function application rule given above is quite restrictive — it requires that argument and 
parameter signatures in an application be textually identical. In view of the many arbitrarily-chosen 
bound identifiers that occur in signatures, it is desirable to relax this restriction. The following rules, 
which embody the signature calculus of [DemersSOa], accomplish this. 

Signature Weakening 

F I- $2 < Si 

r I- e ~ $2 

ri- e ~ Si 
This rule states an obvious property: if S2 is more restrictive than Si and e can be given signature 
S2, then it can also be given signature Si. 

Renaming 

F f— legal func [xi: Si ; ...; x^: Sk] Sp 

Fl- func[xi:Si;...; XkiSk] So < func[...; yi :Si[..., yj/Xj, ...]; ...] So[..., yj/Xj, ...] 

where the identifiers y^ through y^ are new. 

Parameter names may be replaced uniformly in function signatures. 

Reordering 

F h- legal type t [xi: Ri; ...; Xp: Rn] 

F I- type t [xi: Ri; ...; Xp: Rn] < type t [yi: Si ; ...; yp: Sp] 

whenever the (unordered) set of <Xt , R^> pairs is identical to the (unordered) set of <yj , Sj> pairs. 
The order of presentation of components in a type signature is irrelevant. 

Forgetting 

F I— legal type t xi: Ri; ...; Xp: Rn] 

F h- type t [xi: Ri; ...; Xpi Rn] < type t [yi: Si ; ...; yn,: S^] 
whenever the (unordered) set of <Xi , R^y pairs is a superset of the (unordered) set of <yj , Sj> pairs. 
A type signature can be made weaker (less restrictive) by eliminating (or forgetting) some of its 
operations. 

The above rules do not greatly increase the complexity of signature checking but make the 
language much easier to use. For example, the signature of the builtin type Integer as given in 
[BoehmSO] is: 
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type t [ New: func[] var t; 

<-: func[x: var t; y: val t] val t; 
ValOf: func[ x: var t] val t; 
+ : func[x,y: val t] val t 
func[x,y: val t] val t 
func[x,y: val t] val t 
/ : func[x,y: val t] val t; ... ] 
This can easily be shown to match the simpler signature: 
type t [ New: func[] var t; 

ValOf: func[x: var t] val t; 
<-: func[x: var t; y: val t] val t ] 
which occurs as a parameter signature in the Russell Swap function given earlier. Thus, the built-in 
type Integer is a legal argument to Swap, so that Swap can be used to exchange the values of 
two Integer variables. (Adding the forgetting rule is truly a convenience. The Russell language 
provides general facilities for producing new type values by modifying the set of operations provided 
by some existing type; this ability to perform type modification can be used to achieve the effect of 
using the forgetting rule, although it is somewhat cumbersome.) 



4.8 Constructions 

There is one final class of signature correctness rules — rules for the construction of types and 
functions (whose values include non-local references). Here we present only the rule for function 
constructions: 

Function Construction 

r I— legal func[ xi: Si . . . Xp: Sn ] Sq 

( TNoVar. Xi ~ Si, ..., Xp ~ Sp) l— e ~ Sp 

ri— func[xi: Si... Xp: Sp] { e } ~ func[xi: Si... Xp: Sp] Sq 
The important thing to note about this rule is that the signature of the function body e must be 
deducible from r^oVar and the signatures of the parameters; that is, the function body may not 
import any var identifiers. This rule, called the import rule in [DemersSOa], ensures that the value 
returned by a function application depends only on its arguments — there are no hidden dependencies 
on the contents of memory. 



5. Strong Typing 

The Russell signature checking rules make the language "strongly typed" in the usual sense of 
the phrase: each identifier is given an explicit type and programs with mismatched argument/ 
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parameter types are disallowed. An important question, though, is the relation between the syntactic 
signature-matching rules of the previous section and the semantics of data types discussed in Section 
3. The claim that we would like to make is that signature-matching rules given above are not 
arbitrary; they are sufficient to preserve important properties of the semantics of Russell. Below we 
will argue that the signature-matching rules given above satisfy a correctness criterion based on the 
semantics of data types that we have developed in the previous sections of this paper. This form of 
argument gives a much stronger meaning to the claim Russell is strongly typed. 

The easiest way to understand the effect of the Russell signature-matching rules is to consider 
the effect of evaluating a program that violated them. For instance, consider the following very 
simple example: 

let X == Integer$New[] 

in [x] Integers^ [Boolean$True[]]; if x=0 =>3#x=?t0=>17fini 
(we have abbreviated the program by omitting obvious component selections from the type Integer 
in the final conditional statement; instead of [x] Integer$= [lnteger$0[]], we have simply 
written x = 0). The assignment statement x <- Boolean$True[], which uses Integer assignment 
to assign a Boolean value is clearly a violation of the signature-matching rules of the previous 
section. But, if we just ignore the signature violation and consider the evaluation of this program, 
our typeless semantics argues that it will terminate and produce either 3 or 17. Which value is 
produced, however, depends on the encodings of the value that represents the Boolean True and 
the procedure used by Integer to test for 0; in other words, the result depends on how the 
Integer operations misinterpret a value that was intended only to be used by the Boolean 
operations. Note that the example of the section on signature legality shows that guaranteeing that 
no misinterpretation occurs is not as simple as checking the syntactic identity of signatures in Russell. 

Another way to look at the possibility of misinterpretation is in terms of "representation 
independence." If it is impossible for Integer (or any other type) to interpret values intended 
solely for the Boolean operations, then the Boolean operations can be implemented using any 
choice of representation that is convenient (as long as the abstract properties of the operations are 
satisfied). This representation choice is safely encapsulated inside the collection of Boolean 
operations— should it later be advantageous to change it, the semantics of any legal Russell program 
will not change. This idea of justifying type-checking rules in terms of representation independence 
appears in [Reynolds74] (which was influential in the formulation of our ideas); since then it has 
also been discussed in [Donahue79, McCracken79, Haynes82]. 

Obviously representation independence can be destroyed by primitives like the Mesa loophole 
operation which allows any value to masquerade as one to be interpreted by any type. Assuming 
that Russell has no such operations, we would like to know that the signature-checking rules preserve 
representation independence. Looking at the signature-checking rules of the previous section, we 
see that there are three things to be checked: 

1. that the rules for the combining forms are consistent with their semantics. If one had 
been foolish enough to define the signature-checking rule for conditionals as: 
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r I— ci ~ val boolean, F h- ei ~ S, ..., 

r h- Cp ~ val boolean, F t— ep ~ S 

F h- if ci => ei # ... # Cp =^ Bn f i ~ val Integer 

then representation independence would dearly be destroyed. Such a verification is generally 

a straightforward task. 

2. that the rules for matching function and type signatures given in Section 4.7 preserve 
representation independence. It is clear that renaming and reordering preserve the semantic 
intent of the signature. Forgetting, on the other hand, produces a signature with more limited 
opportunities for use and the signature weakening rule goes the right way (stronger signatures 
can be weakened), so the coercion suggested by forgetting (throwing away some of the 
operations of a type) is consistent with the signature-matching rules. 

3. that the rules are sufficient to rule out the possible misinterpretations that might arise 
should it be possible for the same type expression to denote different types in the same scope 
(as discussed in Section 4.7). The Russell rules, which manipulate type expressions as syntactic 
objects, can only work if the language is constrained to ensure that textually identical type 
expressions in the same scope always denote semantically equivalent type values. 

We can informally argue that the signature-checking rules ensure that syntactic identity guarantees 
semantic equivalence for type expression based on the in variance of the meanings of Russell programs 
under certain syntactic transformations. These transformations can be viewed as equations or axioms 
constraining the behavior of programs. A collection of such equations for Russell can be found in 
[Demers83]. Their relevance to the type checking rules is described below. 

Russell has been designed to guarantee that expressions in the language have the substitution 
property, described as follows. A program fragment of the form: 

letx==ain ^[x/y] ni 
is semantically equivalent (i.e., produces the same value and has the same effect on the state) to the 
fragment: 

letx==«in p[a/y] ni 
whenever the expression a is variable- free (i.e., where none of the free identifiers of a has var 
signature; equivalently if a can be given a signature using an environment F^oVar for some F). 
Informally, this rule allows ")S -reduction" of variable-free declarations — within the scope of a 
declaration, any or all occurrences of its left hand side (x) may be replaced by its right hand side 
(a) without affecting the meaning of the program fragment. Thus, evaluations of identical 
variable-free denotations must produce semantically identical values and must be free of observable 
side-effects. And since all type expressions that appear in signatures must be variable-free by the 
signature legality rule, the Russell rules preserve representation independence. Another consequence 
of this rule is that variable-free Russell programs, like the lambda calculus, have the Church-Rosser 
property; in particular, terminating programs cannot distinguish between call-by- value and 
call-by-name semantics. 
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The substitution property is the reason for the import restriction used in the signature-checking 
rule for function constructions: no free identifier in the body of a function may have var signature 
(the rules also applies to type constructions). By making operation expressions (functions and types) 
variable-free, we can have the rather strong substitution property defined above. The only variables 
that an expression may reference are those that appear explicitly within it; there can be no hidden 
state accessed through functions that manipulate global variables. 

This is admittedly a restrictive rule, as it prevents functions from inspecting or modifying global 
variables and prohibits applications like: 

Array [1, N, integer] 
where N is a variable. The rule does not, however, prevent obtaining the intended effect of the 
above application; it is simply necessary to introduce a new identifier and bind it to the current 
value of N: 

let VN == ValueOf N] in... Array[l, VN, integer] ... 
Less obviously, this restriction also rules out certain primitives in the language. For example, if we 
added a "pointer dereferencing" operation in the language with the natural signature: 

t: func[p: val REF Integer] var Integer 
then one could write the following function, which would be perfectly legal according to the signature 
rules given above but would destroy the substitution property for variable free expressions: 
func[p: val REF Integer] val Integer 
{ Integer$ValOf[pt] } Note that this function only traffics in vals. 
Providing a less restrictive treatment of variables while preserving the ability to do purely syntactic 
signature-matching remains a hard problem. 



6. Conclusions 

In attempting to define terms like "data type," language designers should keep in mind the spirit 
in which such definitions should be given. We quote from Polya [73, pg. 86]: 

The mathematician [or the language designer] is not concerned with the current meaning of 
his technical terms, at least not primarily concerned with that. The mathematical definition 
creates the mathematical meaning. 

Thus, when we define a term, we should seek a definition that creates a useful meaning, a meaning 
that gives insight into language features or language design decisions. On the basis of the preceding 
discussion, we feel our definition of data type as a set of operations providing an interpretation of 
values satisfies this criterion of utility: 

This treatment of types allows us to give a straightforward meaning to polymorphism. Types 
are values and are legal arguments to procedures, functions or types. Moreover, as we argued 
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in Section 3, our meaning of types subsumes the usual meaning of types as sets of values; it 
all depends on whether one looks at the retraction or at its range. 

As this definition explicitiy rules out type errors as semantic notions (values do not have 
types), the role of syntactic type-checking must be carefully explained. In Section 5, we 
presented a justification of the Russell signature matching rules in terms of a fundamental 
property of the semantics — representation independence. 

Our study of data types has also convinced us that ignoring the possibility of polymorphism is 
a false simplification in language design. If a designer has a clear conception of what "data type" 
means to him, then the meaning of polymorphic constructs should be obvious. Indeed, the question 
of the meaning of type parameters seems to be a perfect touchstone for one's understanding of types. 
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